Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Fix incorrect parsing of bfrange (#631) #763

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nsfisis
Copy link

@nsfisis nsfisis commented Feb 25, 2025

Type of pull request

  • Bug fix (involves code and configuration changes)
  • New feature (involves code and configuration changes)
  • Documentation update
  • Something else

About

fixes #631

Previously, the regular expression for single-offset bfrange mapping unintentionally matched portions of the bracketed array form. This caused extremely large code range, leading to out of memory.

Now, we capture the <from> and <to> ranges first and then distinguish between single-offset (<xxxx>) and array ([<xxxx> <xxxx> ...]) forms. This ensures that the single-offset regex won't accidentally match bracketed segments.

Checklist for code / configuration changes

See CONTRIBUTING.md for all essential information about contributing.

TODO

  • Write tests.

Question:
In order to write test code, I need to generate a PDF that contains arbitrary bfrange sections. However, I’m afraid I don’t know how to do that. Do you have any suggestions on how to create such a file?

Previously, the regular expression for single-offset bfrange mapping
unintentionally matched portions of the bracketed array form. This
caused extremely large code range, leading to out of memory.

Now, we capture the `<from>` and `<to>` ranges first and then
distinguish between single-offset (`<xxxx>`) and array (`[<xxxx> <xxxx> ...]`)
forms. This ensures that the single-offset regex won't
accidentally match bracketed segments.
@k00ni
Copy link
Collaborator

k00ni commented Mar 1, 2025

In order to write test code, I need to generate a PDF that contains arbitrary bfrange sections. However, I’m afraid I don’t know how to do that. Do you have any suggestions on how to create such a file?

You don't need a real PDF file, instead create Font instances and set them up depending on your test case. If you want to test how it handles bfrange sections, you can create different strings with correct and faulty data and inject them in the instance. You might need to use other class instances too to build a working case, but it seems reasonable and doable.

Here are a few starting points which might help:

@nsfisis
Copy link
Author

nsfisis commented Mar 1, 2025

Thank you, @k00ni, I'll try it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allowed memory exhausted when parse the PDF file.
2 participants